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Introduction 


What are secondary data? Secondary data refer to data that 
are collected by someone other than the user or are used for 
an additional purpose than the original one. A wide range of 
sources can be used as secondary data: censuses, information 
collected by government departments, organizational records 
and data that were originally collected for other research 
purposes [1-3]. Yee and Niemeier [4] discuss the benefits of 
longitudinal data as compared to repeated cross-sectional 
information. 


Use of repeated cross sectional or longitudinal secondary 
data to explore social and health issues can result in the 
ability to provide comparative information about important 
environmental issues. For example, social or health related 
information could be examined before, during and after the 
current COVID 19 pandemic to gain some understanding of 
the course and impact of the outbreak and to inform resource 
allocation. Using secondary analyses of survey data collected 
by the China CDC, Gao, et al. [5] was able to provide timely 
information to demonstrate geographical differences and 
duration of Coronavirus in health care workers in China. 


Secondary data can answer two types of questions: 
descriptive and analytical. Hence, the information can 
be used to describe events or trends or it can be used to 
examine relationships among variables cross-sectionally 
or longitudinally. Numerous secondary data bases exist and 
many are available online (e.g., The European Bioinformatics 
Institute database [6] provides a searchable database of 
biologic sources that can be linked to survey data). The Centre 
for Addiction and Mental Health (CAMH) conducts surveys in 
adults in Ontario, Canada (CAMH Monitor) that are repeated 
cross-sectional studies. The Monitor has been used in both 
descriptively and analytically and has provided important 
information on a multitude of health behaviors and policies. 
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Examples 


An analysis of CAMH Monitor data from 1996-2006 
provided important descriptive information about quitting 
smoking among individuals who were categorized as regular 
or occasional smokers. We found that the prevalence of having 
quit smoking for at least one year increased over time. In 
addition, females were more likely to show this increase than 
males, and older individuals more likely than younger ones 
[7]. These results provide us with the backdrop for examining 
additional questions in future research about why people quit, 
what programs might help people quit, and whether those who 
do quit are using new products that have become available 
such as e-cigarettes, waterpipes, smokeless tobacco and bidis. 
In addition, future research could be undertaken to explore 
whether methods of quitting have changed over time. Either 
survey questions could be developed to examine these issues 
or qualitative interviews could be used to supplement the 
information from the survey. 


CAMH Monitor data have also been used descriptively to 
analyze effects of new legislation or policies by examining 
trends before and after the introduction of the legislation or 
policy, such as the potential impact of legislation on motor 
vehicle collisions in Ontario among smokers and nonsmokers. 
Legislation was enacted in Ontario in 2006 to prohibit smoking 
in vehicles when children and adolescents were present. We 
found that before the law was enacted the rate of reported 
collisions was higher among smokers than non smokers. 
Following the enactment of the legislation the rate among 
smokers decreased and there was no statistical difference 
between smokers and nonsmokers [8]. What is not known 
is whether drivers are in fact smoking while they are driving, 
their awareness of the legislation and whether their driving— 
smoking patterns changed because of the legislation. Another 
study examining cross-sectional CAMH data over time to assess 
legislative effects, found that texting and driving declined after 
introduction of more severe penalties [9]. 

ea 


Citation: Pederson LL, Vingilis E, Wickens CM, Koval J, Mann RE (2020) Use of secondary data analyses in research: Pros and Cons. J Addict Med Ther Sci 6(1): 


058-060. DOI: https://dx.doi.org/10.17352/2455-3484.000039 


fy Peertechz® Publications 


https://www.peertechz.com/journals/journal-of-addiction-medicine-and-therapeutic-science 8 


Other examples of the use of CAMH Monitor data to 
evaluate policy interventions include Wickens, et al. [9] who 
assessed the impact of legislation to increase penalties for 
distracted driving on rates of texting and driving, and Mann, 
et al. [10] who evaluated the impact of legislation introducing 
administrative sanctions for impaired driving in on rates 
of driving after drinking in the province. These secondary 
analyses can also be supplemented with qualitative interviews 
to provide some explanation and background for the original 
findings. 


Other types of secondary databases are longitudinal where 
large samples of individuals are followed over a number of 
years. For example, Wiesenthal and Vingilis [11] analyzed 
the Canadian National Population Health Survey (NPHS) 
descriptively and analytically to examine trends over time and 
relationships among variables. Specifically, they examined 
trajectories of distress in participants after they reported being 
injured from a motor vehicle collision. The NPHS, a Statistics 
Canada survey, is a repeated measures longitudinal survey 
to monitor the health and wellbeing of 20,000 Canadians. 
Participants were interviewed biennially from 1994/95 to 
2002/03 (5 waves of interviews over a 9-year span). Because of 
the longitudinal nature of the secondary database, hierarchical 
linear modelling was used to identify within person trends; 
men experienced greater overall distress over time than 
women and a greater increase in distress over time. Moreover, 
the level of pre-injury distress predicted post-injury distress. 
This study revealed more complex and nuanced relationships 
among variables in their prediction of post-motor vehicle 
injury psychological distress. This secondary database provided 
numerous benefits. First, motor vehicle injuries are rare events; 
however, a sample of 20,000 individuals interviewed over 9 
years provided enough cases of motor vehicle injury to examine 
the effects of injuries on distress. Additionally, evidence was 
mixed on whether pre-morbid distress predicted post-injury 
distress as all previous studies only had retrospective data on 
pre-injury distress levels. The use of a longitudinal secondary 
database provided information on distress levels before the 
injury occurred. The large sample size of injured individuals in 
this secondary database allowed for examination of mediators 
and moderators of the effects. 


Finally, secondary data can be administrative data, that is, 
official records, such as hospital or police records. For example, 
the impact of new stunt driving legislation using stunt driving 
charges and collision casualty statistics, identified a decrease 
in charges and collision casualties among young males after 
the 2007 street racing legislation was introduced [12,13]. In 
addition, different types of secondary data can complement 
each other. Secondary data of hospital and police records 
can identify cases where individuals were apprehended or 
injured severely enough to go to hospital while self-report 
data identifies cases that might be missed by more official 
secondary data tools. 


Discussion 


Of course, there are some important factors that need to be 
considered in the use of secondary data. 


Pros: First, there is much information available that has 
been collected in the past. This information can be used 
to make important contributions to knowledge, provide 
recommendations for policy, and provide the backdrop for 
future research. 


Second, because the information is already available, 
subsequent research can be conducted in a timely manner, 
without the longer timelines for submitting proposals for 
funding and collecting original data. This is particularly 
salient because often events happen, such as the introduction 
of policies or historical events such as the current COVID 19 
pandemic, before there is any opportunity for researchers to 
prepare to collect the relevant information needed to evaluate 
their impact. Third, often large sample sizes are available with 
secondary datasets, which is particularly important when 
investigating rare events. Moreover, certain types of secondary 
data have added benefits. For example, longitudinal secondary 
datasets have increased statistical power and can estimate a 
greater range of conditional probabilities compared to repeated 
cross-sectional secondary datasets [4]. 


The use of secondary data also gives researchers who have 
conducted the original surveys additional information that they 
can use to justify continuation of their original research. For 
example, there is strong epidemiological evidence connecting 
cannabis use to collision risk [13-16] that has spurred and 
informed experimental simulation studies examining precisely 
how cannabis affects driving [18,19]. 


Cons: As noted, secondary data may not provide all of 
the information of interest. Questions may not be worded 
as precisely as we would like to answer specific questions of 
interest. Analyses become more complicated if the question 
wording or methods of administration vary. In these cases, 
it is particularly difficult to decide how information from a 
range of years can be considered together. It is also critical 
to understand how the information was originally collected. 
Response rates to surveys have decreased over time, calling 
into question how representative the responses might be, 
which must be considered in the interpretation of secondary 
analyses. However, many well designed surveys include 
sampling weights to counter the biases that may occur from 
non-representative sampling. Longitudinal secondary datasets 
can suffer from attrition, although this is sometimes addressed 
by replacing lost respondents [4]. 


Online surveys are limited to those with access to the 
technology; targeted sub-groups who may not be the groups of 
interest when doing secondary analysis; and are correlational 
precluding cause and effect conclusions. Finally, ethics 
approval may be required if the information is being used for a 
purpose not originally proposed 


Conclusion 


It is important to make note of the limitations when 
presenting the information from secondary data and what 
the potential impact on the interpretation of the results can 
be. Nevertheless, secondary analysis can make important 
contributions to knowledge as well as provide directions for 
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future research and programs. Tripathy (2013) [20] notes 
that while secondary data analysis can make important 
contributions to knowledge, it is important to follow specific 
guidelines in the use of such information, one of the most 
important being anonymization of the information. 
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